stanford parser
A Scalable Pipeline for Estimating Verb Frame Frequencies Using Large Language Models
Morgan, Adam M., Flinker, Adeen
We present an automated pipeline for estimating Verb Frame Frequencies (VFFs), the frequency with which a verb appears in particular syntactic frames. VFFs provide a powerful window into syntax in both human and machine language systems, but existing tools for calculating them are limited in scale, accuracy, or accessibility. We use large language models (LLMs) to generate a corpus of sentences containing 476 English verbs. Next, by instructing an LLM to behave like an expert linguist, we had it analyze the syntactic structure of the sentences in this corpus. This pipeline outperforms two widely used syntactic parsers across multiple evaluation datasets. Furthermore, it requires far fewer resources than manual parsing (the gold-standard), thereby enabling rapid, scalable VFF estimation. Using the LLM parser, we produce a new VFF database with broader verb coverage, finer-grained syntactic distinctions, and explicit estimates of the relative frequencies of structural alternates commonly studied in psycholinguistics. The pipeline is easily customizable and extensible to new verbs, syntactic frames, and even other languages. We present this work as a proof of concept for automated frame frequency estimation, and release all code and data to support future research.
duncanka/Causeway
Causeway is a system for detecting explicit causal relations in text. It tags text using the BECAUSE 1.0 annotation scheme, described in Dunietz et al., 2015. The system itself is described in Dunietz et al., 2017. Note that the repository includes some code for reading in data in an updated version of the annotation scheme (BECAUSE 2.x). This newer scheme is backwards-compatible with the original.
The Stanford Natural Language Processing Group
A tokenizer divides text into a sequence of tokens, which roughly correspond to "words". We provide a class suitable for tokenization of English, called PTBTokenizer. It was initially designed to largely mimic Penn Treebank 3 (PTB) tokenization, hence its name, though over time the tokenizer has added quite a few options and a fair amount of Unicode compatibility, so in general it will work well over text encoded in the Unicode Basic Multilingual Plane that does not require word segmentation (such as writing systems that do not put spaces between words) or more exotic language-particular rules (such as writing systems that use: or? An ancillary tool uses this tokenization to provide the ability to split text into sentences. PTBTokenizer mainly targets formal English writing rather than SMS-speak.
Simple CoreNLP - Stanford.NLP.NET
In addition to the fully-featured annotator pipeline interface to CoreNLP, Stanford provides a simple API for users who do not need a lot of customization. The intended audience of this package is users of CoreNLP who want "just use nlp" to work as fast and easily as possible, and do not care about the details of the behaviors of the algorithms. The API is included in the CoreNLP release from 3.6.0 Visit the download page to download CoreNLP; make sure to set current directory to folder with models! Intuitive Syntax Conceptually, documents and sentences are stored as objects, and have functions corresponding to annotations you would like to retrieve from them.
Stanford Parser and NLTK
As both tools changes rather quickly and the API might look very different 3-6 months later. Please treat the following answer as temporal and not an eternal fix. Firstly, one must note that the Stanford NLP tools are written in Java and NLTK is written in Python. The way NLTK is interfacing the tool is through the call the Java tool through the command line interface. Secondly, the NLTK API to the Stanford NLP tools have changed quite a lot since the version 3.1.
The Stanford Natural Language Processing Group
A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as "phrases") and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences. These statistical parsers still make some mistakes, but commonly work rather well. Their development was one of the biggest breakthroughs in natural language processing in the 1990s. You can try out our parser online.
Unsupervised Word Sense Disambiguation Using Markov Random Field and Dependency Parser
Chaplot, Devendra Singh (Samsung Electronics Co., Ltd.) | Bhattacharyya, Pushpak (IIT Bombay) | Paranjape, Ashwin (Stanford University)
Word Sense Disambiguation is a difficult problem to solve in the unsupervised setting. This is because in this setting inference becomes more dependent on the interplay between different senses in the context due to unavailability of learning resources. Using two basic ideas, sense dependency and selective dependency, we model the WSD problem as a Maximum A Posteriori (MAP) Inference Query on a Markov Random Field (MRF) built using WordNet and Link Parser or Stanford Parser. To the best of our knowledge this combination of dependency and MRF is novel, and our graph-based unsupervised WSD system beats state-of-the-art system on SensEval-2, SensEval-3 and SemEval-2007 English all-words datasets while being over 35 times faster.
Semantic Analysis of English Specification of OCL
Bajwa, Imran Sarwar (University of Birmingham) | Lee, Mark (University of Birmingham) | Bordbar, Behzad (University of Birmingham)
In this paper, we present a novel approach NL2OCL to translate English specification of constraints to formal constraints such as OCL (Object Constraint language). In the used approach, input English constraints are syntactically and semantically analyzed to generate a SBVR (Semantics of Business Vocabulary and Rules) based logical representation that is finally mapped to OCL. During the syntactic and semantic analysis we have also addressed various syntactic and semantic ambiguities that make the presented approach robust. The presented approach is implemented in Java as a proof of concept. A case study has also been solved by using our tool to evaluate the accuracy of the presented approach. The results of evaluation are also compared to the pattern based approach to highlight the significance of the used approach.